TensorFlow Object Detection

by CM


Posted on April 19, 2020



The Goal:

In this article, we will leverage the TensorFlow Object Detection API for detecting live traffic. In particular, we will use TensorFlow 2 and OpenCV to do live inference on a video stream. The goal is to use the ssd_mobilenet_v1_coco_2017_11_17 model to detect and ultimately count cars. As mentioned in the Object Detection Basic article, please make sure to have installed the following dependencies: TensorFlow 2.x, OpenCV, and the TF Object Detection API. In case you have not used / installed the dependencies before -- there is a nice tutorial by sentdex how to install both TF and OpenCV.

Object Detection:

Having an image or a video stream, an object detection model should be able to identify a set of objects as well as their position within an image. In other words, the object detection model that we will build in the article will be trained to detect the presence and location of multiple classes of objects. The idea is that we will only focus on the ability to detect car of the model. In other words, we will ignore all other detection classes.


Key components are:

Lets first, check our TensorFlow Version. We plan to build a Object Detection Model with TensorFlow 2.x. Remember the original object detection API by Google was designed for TF1.x and is incompatible with TF2.x. In our case, we are working with version '2.1.0'

import tensorflow as tf
tf.__version__

Let's jump right into the Code. First, we import all required dependencies. (1) pathlib offers classes representing filesystem paths with semantics appropriate for different operating systems. (2) importlib has two purposes. One is to provide the implementation of the import statement (and thus, by extension, the __import__() function) in Python source code. Two, the components to implement import are exposed in this package, making it easier for users to create their own custom objects (known generically as an importer) to participate in the import process. (3) numpy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. (4) OpenCV is a library of programming functions mainly aimed at real-time computer vision. (5) TF object detection API is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection.

import pathlib
import importlib
import numpy as np

#OpenCV
import cv2

#TensorFlow Object Detection API
from object_detection.utils import ops as utils_ops
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

Second, as we are using TensorFlow 2.x, we need to patch / rename two TensorFlow files of the classical object detection API.

# Rename tf1 into `utils.ops`
utils_ops.tf = tf.compat.v1

# Rename tf.io.gfile into tf.gfile
tf.gfile = tf.io.gfil

We then will define four functions: (1) Loading the model (2) Reducing the object detection to the Car Class (3) Running inference on picture material (4) Initializing the model.

We will start of with building the loading model function. We therefore will make use of the pretrained model provided by the TensorFlow API. In addition, we will specify the path to our prediction labels that we will later use to map the different classes in the image / video stream.

def load_model(model_name):
  base_url = 'http://download.tensorflow.org/models/object_detection/'
  model_file = model_name + '.tar.gz'
  model_dir = tf.keras.utils.get_file(
    fname=model_name,
    origin=base_url + model_file,
    untar=True)

  model_dir = pathlib.Path(model_dir)/"saved_model"

  model = tf.saved_model.load(str(model_dir))
  model = model.signatures['serving_default']

  return model


# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = 'data/mscoco_label_map.pbtxt'
category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

The second function will be the inference function on a single image. Therefore, we will provide two arguments: The Model that we will run inference on as well as the input file. As we are working with TensorFlow, we will need to covert the images to input tensors. Our idea is to provide the image a batch of images to run inference on. Further, we will need to define our output dictionary. Remember all outputs are batches tensors. Hence, we need to convert them to numpy arrays, and take index [0] to remove the batch dimension. Note that detection_classes should be ints. Now it is time to handle the model masks in the frame of the image. Therefore, we will check frame a detection mask in the respective shape around the image.

def run_inference_for_single_image(model, image):
    image = np.asarray(image)
    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis, ...]

    # Run inference
    output_dict = model(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(output_dict.pop('num_detections'))
    output_dict = {key: value[0, :num_detections].numpy()
                   for key, value in output_dict.items()}
    output_dict['num_detections'] = num_detections

    # detection_classes should be ints.
    output_dict['detection_classes'] = output_dict['detection_classes'].astype(np.int64)


    # Handle models with masks:
    if 'detection_masks' in output_dict:
        # Reframe the the bbox mask to the image size.
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            output_dict['detection_masks'], output_dict['detection_boxes'],
            image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(detection_masks_reframed > 0.5,
                                           tf.uint8)
        output_dict['detection_masks_reframed'] = detection_masks_reframed.numpy()

    classes2 = output_dict['detection_classes'].astype(np.int64)
    scores2 = output_dict['detection_scores']

    return output_dict

!-------->

The third function will be the reduction of classes of the TensorFlow API. Remember, we only want to detect cars with our model. Hence, we only gonna return the results of the output_dict of a single class that we select by setting the class_id.

def reduce_to_one_class(output_dict, class_id):
    indices = [i for i, x in enumerate(output_dict['detection_classes']) if x == class_id]
    return {'detection_classes': output_dict['detection_classes'][indices],
            'detection_boxes': output_dict['detection_boxes'][indices],
            'detection_scores': output_dict['detection_scores'][indices],
            'num_detections': len(indices)}

Now we will define our run_infernece function with using the webcam stream as an input.

def run_inference(model):
    # activate video capture option

    #cv2 = getpack("opencv-python", "cv2")
    cap = cv2.VideoCapture(0)
    total_count = 0
    total_passed_vehicle = 0
    count_current_frame = 0
    count_before_frame = 0
    width_heigh_taken = True
    height = 0
    width = 0
    i = 0
    count = 0
    count_before_frame_wait_1 = 0
    count_before_frame_wait_2 = 0


    while True:

        (ret, image_np) = cap.read()

        if not  ret:
                print("end of the video file...")
                break

        input_frame = image_np
        # Actual detection.
        output_dict = run_inference_for_single_image(model, image_np)
        output_dict = reduce_to_one_class(output_dict, class_id=3)


        final_score = output_dict['detection_scores']

        count_current_frame = 0
        final_score = np.squeeze(final_score)
        print('final_score: ',final_score)
        print('final_score.size: ',(final_score.size))
        if (final_score.size) > 0:
            if (final_score.size) > 1:
                #Iteerate of Scores -- in case more than one car has been 'detected' as a class.
                for i in output_dict['detection_scores']:
                    print('i: ',i)
                    #print("output_dict['detection_scores'][i]", output_dict['detection_scores'][i])
                    if  i > 0.5:


                        count_current_frame =  count_current_frame + 1
            elif i > 0.5:
                            count_current_frame =  1

        if count_before_frame < count_current_frame:
            total_count = total_count + count_current_frame
            print("New car(s)")

        elif count_before_frame == count_current_frame:
            print("Same car")

        elif count_before_frame > count_current_frame:
            print("Less car(s) / No Car")
            if count_before_frame_wait_1 > count_current_frame:
                count_before_frame = count_before_frame - count_current_frame


            count_before_frame_wait_1 = count_before_frame - count_current_frame


        count_before_frame = count_current_frame
        print('count ', count_current_frame)

        vis_util.visualize_boxes_and_labels_on_image_array(

            image_np,
            output_dict['detection_boxes'],
            output_dict['detection_classes'],
            output_dict['detection_scores'],
            category_index,
            instance_masks=output_dict.get('detection_masks_reframed', None),
            use_normalized_coordinates=True,

            line_thickness=4)


        cv2.imshow('object counting',input_frame)
        #cv2.imshow('Object detection', cv2.resize(image_np,(1280,720)))
        if cv2.waitKey(25) & 0xFF ==ord('q'):
            cv2.destroyAllWindows()
            break

Lastly, we will initialize our model.

def main():

    model_name = 'ssd_mobilenet_v1_coco_2017_11_17'
    detection_model = load_model(model_name)
    run_inference(detection_model)


if __name__=="__main__":
    main()

Give it a second to open up the Webcam window. After the window opens up -- you should be able to do live inference. The Object Detection should now be able to detect and count cars.

In this simple tutorial, we have used TensorFlow Object Detection API to do live inference with a Webcam on a particular object including respective counting.

Leverage OpenCV and TensorFlow Object Detection

#EpicML


News
Dec 2021

--- Quantum ---

Simulating matter on the quantum scale with AI #Deepmind
Nov 2021

--- Graviton3 ---

Amazon announced its Graviton3 processors for AI inferencing - the next generation of its custom ARM-based chip for AI inferencing applications. #Graviton3
May 2021

--- Vertex AI & TPU Gen4. ---

Google announced its fourth generation of tensor processing units (TPUs) for AI and ML workloads and the Vertex AI managed platform #VertexAI #TPU
Feb 2021

--- TensorFlow 3D ---

In February of 2021, Google released TensorFlow 3D to help enterprises develop and train models capable of understanding 3D scenes #TensorFlow3D
Nov 2020

--- AlphaFold ---

In November of 2020, AlphaFold 2 was recognised as a solution to the protein folding problem at CASP14 #protein_folding
Oct 2019

--- Google Quantum ---

A research effort from Google AI that aims to build quantum processors and develop novel quantum algorithms to dramatically accelerate computational tasks for machine learning. #quantum_supremacy
Oct 2016

--- AlphaGo ---

Mastering the game of Go with Deep Neural Networks. #neural_network